Unsupervised speaker normalization using canonical correlation analysis
نویسندگان
چکیده
Conventional speaker-independent HMMs ignore the speaker di erences and collect speech data in an observation space. This causes a problem that the output probability distribution of the HMMs becomes vague so that it deteriorates the recognition accuracy. To solve this problem, we construct the speaker subspace for an individual speaker and correlate them by o-space canonical correlation analysis between the standard speaker and input speaker. In order to remove the constraint that input speakers have to speak the same sentences as the standard speaker in the supervised normalization, we propose in this paper an unsupervised speaker normalization method which automatically segments the speech data into phoneme data by Viterbi decoding algorithm and then associates the mean feature vectors of phoneme data by o-space canonical correlation analysis. We show the phoneme recognition rate by this unsupervised method is equivalent with that of the supervised normalization method we already proposed.
منابع مشابه
Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition
Probabilistic Linear Discriminant Analysis (PLDA) continues to be the most effective approach for speaker recognition in the i-vector space. This paper extends the PLDA model to include both enrollment and test cut duration as well as to distinguish between session and channel variability. In addition, we address the task of unsupervised adaptation to unknown new domains in two ways: speaker-de...
متن کاملVocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition
Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-dependent speech feature. The variation of vocal tract length and/or shape is one of the major source of inter-speaker variations. In this paper, we address several methods of vocal tract length normalization (VTLN) for large vocabulary continuous speech recognition: (1) explore the bilinear warping VTL...
متن کاملImproved Speaker Markov Modelling for Unsupervised Speaker Normalization
We propose new methods of improved speech recognition with speaker-variable Information. Hidden Markov Model-based recognizers which are trained by reference speaker(s) (RS) are normalized by our two different approaches to give a better speaker-independent recognition rate. Our normalization methods are based on the same principle of inter-speaker Markov mapping. This mapping gives inter-speak...
متن کاملUnsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
This paper proposes a new approach to unsupervised speaker adaptation inspired by the recent success of the factor analysisbased Total Variability Approach to text-independent speaker verification [1]. This approach effectively represents speaker variability in terms of low-dimensional total factor vectors and, when paired alongside the simplicity of cosine similarity scoring, allows for easy m...
متن کاملEliminating Inter-speaker Variability Prior to Discriminant Transforms
This paper shows the impact of speaker normalization techniques such as vocal tract length normalization (VTLN) and speaker-adaptive training (SAT) prior to discriminant feature space transforms, such as LDA. We demonstrate that removing the inter-speaker variability by using speaker compensation methods results in improved discrimination as measured by the LDA eigenvalues and also in improved ...
متن کامل